Panama Paperes

The Panama Papers are 11.5 million leaked documents that detail financial and attorney–client information for 213.634 offshore entities.The documents, were created by, and taken from, Panamanian law firm and corporate service provider Mossack Fonseca, and were leaked in 2015 by an anonymous source.

Those involved contracted with the law firm, Mossack Fonseca, consultants for companies, services consisting in founding and establishing companies registered in a tax haven in such a way that they fulfilled the primary objective of hiding the identity of the owners. (Los implicados contrataban con el bufete de abogados consultores de empresas, Mossack Fonseca, servicios consistentes en fundar y establecer compañ??as inscritas en un para??so fiscal de modo tal que cumpliesen con el objetivo primario de «ocultar la identidad de los propietarios»)

The total size of the archives is about 2,6 TeraBytes.

Composed of:
Type Quantity
E-mail 480.461.817
Data base 304.730.617
PDF 215.426.417
Image 111.702.617
Text Document 32.916.617
Other 2.242

Data sets

The data set we used for the analisis was taked from OFFSHORE LEAKS DATABASE by The International Consortium of Investigative Journalists.

“The International Consortium of Investigative Journalists is a global network of more than 190 investigative journalists in more than 65 countries who collaborate on in-depth investigative stories.”

The data set is composed by five .csv files:

On the other hand, it was powered by Neo4j a graph database that structures data in nodes. The data base follows the folliwing structure:

We also used a .geo.json file for the interactive map provided by GeoJSON-maps with information about location and demographic aspect of each country.

Importing and Exploring

For importing the .csv files we use the read_csv() function from the readr package.

For the .geo.json file we use the geojson_read() from the geojsonio package, with the what argument equal to “sp”.

Edges

The Edges file keep the information about relationship between entity, officer, intermediaries and their adresses and is contained in the panama_papers.edges.csv file.

Using glimpse() function from dplyr package we display the structure of the imported files. We use glimpse() and not str() beacuse the visualization looks cleaner.

## Observations: 674,102
## Variables: 8
## $ START_ID    <chr> "10000035", "10000044", "10000055", "10000064", "1...
## $ TYPE        <fct> registered_address, registered_address, registered...
## $ END_ID      <chr> "14095990", "14091035", "14095990", "14091429", "1...
## $ link        <chr> "registered address", "registered address", "regis...
## $ start_date  <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ end_date    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ sourceID    <chr> "Panama Papers", "Panama Papers", "Panama Papers",...
## $ valid_until <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

As we see, this data frame has 8 variables and 674.102 observations. Variables are:

  • START_ID is an character that show, using a code number, the starting node of the tides between two nodes of different type.
  • TYPE is a factor that show the type of conection between the nodes. Possible values are “intermediary_of”, “officer_of” and “registered_address”.
  • END_ID is an character that show, using a code number, the ending node of the tides between two nodes.
  • link is a character that describe the type of relationship between the nodes in a deep way.
  • start_date starting date of the relationship. Most of the times is missing.
  • end_date ending date of the relationship. Most of the times is missing.
  • sourceID in this files the source is always “Panama Papers”.
  • valid_until

Using this file we obtained information about the frequency that any node appear.

Nodes

We combine all nodes data sets into one, keeping only useful variables using dplyr package.

## Observations: 559,600
## Variables: 4
## $ node_id       <chr> "10000001", "10000002", "10000003", "10000004", ...
## $ name          <chr> "TIANSHENG INDUSTRY AND TRADING CO., LTD.", "NIN...
## $ country_codes <chr> "HKG", "HKG", "HKG", "HKG", "HKG", "HKG", "HKG",...
## $ type          <chr> "entity", "entity", "entity", "entity", "entity"...

On the other hand, any nodes type has its owns atributes as we will see in the following part.

Entity Nodes

This file containe information about entity nodes.

## Observations: 213,634
## Variables: 17
## $ node_id                  <chr> "10000001", "10000002", "10000003", "...
## $ name                     <chr> "TIANSHENG INDUSTRY AND TRADING CO., ...
## $ jurisdiction             <chr> "SAM", "SAM", "SAM", "SAM", "SAM", "S...
## $ jurisdiction_description <chr> "Samoa", "Samoa", "Samoa", "Samoa", "...
## $ country_codes            <chr> "HKG", "HKG", "HKG", "HKG", "HKG", "H...
## $ countries                <chr> "Hong Kong", "Hong Kong", "Hong Kong"...
## $ incorporation_date       <date> 2006-03-23, 2006-03-27, 2006-01-10, ...
## $ inactivation_date        <date> 2013-02-18, 2014-02-27, 2012-02-15, ...
## $ struck_off_date          <date> 2013-02-15, 2014-02-15, 2012-02-15, ...
## $ closed_date              <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ibcRUC                   <chr> "25221", "25249", "24138", "24012", "...
## $ status                   <fct> Defaulted, Defaulted, Defaulted, Defa...
## $ company_type             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ service_provider         <chr> "Mossack Fonseca", "Mossack Fonseca",...
## $ sourceID                 <chr> "Panama Papers", "Panama Papers", "Pa...
## $ valid_until              <chr> "The Panama Papers data is current th...
## $ note                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

The file is composed by 213.634 observations and 17 variables.

Variables are:

  • node_id is a character that has the numeric code that identify unequibocaly the node. node_id is the key of the Entity.
  • name is a character that has the name of the entity.
  • jurisdiction is a character
  • jurisdiction_descriptionis a character
  • country_codes is a character that contain the code of the country where the entity is located. The code is composed by three letters.
  • countriesis a character that has the name of the country where the entity is located.
  • incorporation_date incorporation date of the entity node.
  • inactivation_date inactivation date of the entity node.
  • struck_off_date struck off date of the entity node.
  • closed_date closed date of the entity node.
  • ibcRUC
  • status is a factor that show the actual stauts of the Entity. Possible values are “Active”, “Shelf company”, “Defaulted”, “Dissolved shelf company”, “Trash company”, “Changed agent” and “Dissolved”.
  • company_type
  • service_provider the original file was taken from Mossack Fonseca, so the value of this variable is alwas a character “Mossack Fonseca”
  • sourceID in this files the source is always “Panama Papers”.
  • valid_until
  • note

Adresses

This file containe information about know adresses of the nodes.

## Observations: 93,454
## Variables: 8
## $ node_id       <chr> "14000001", "14000002", "14000003", "14000004", ...
## $ name          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ address       <chr> "-\t27 ROSEWOOD DRIVE #16-19 SINGAPORE 737920", ...
## $ country_codes <chr> "SGP", "KAZ", "GBR", "GGY", "KEN", "MCO", "CHE",...
## $ countries     <chr> "Singapore", "Kazakhstan", "United Kingdom", "Gu...
## $ sourceID      <chr> "Panama Papers", "Panama Papers", "Panama Papers...
## $ valid_until   <chr> "The Panama Papers  data is current through 2015...
## $ note          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

This file is composed by 93.454 observations and 8 variables.

Variables are:

  • node_id is a character that has the numeric code that identify unequibocaly the node. node_id is the key of the Entity.
  • name is a character that has the name of the adress.
  • address
  • country_codes
  • countries
  • sourceID in this files the source is always “Panama Papers”.
  • valid_until
  • note

Officer Nodes

## Observations: 238,402
## Variables: 7
## $ node_id       <chr> "12000001", "12000002", "12000003", "12000004", ...
## $ name          <chr> "KIM SOO IN", "Tian Yuan", "GREGORY JOHN SOLOMON...
## $ country_codes <chr> "KOR", "CHN", "AUS", "JPN", "VNM", "AUS", "PHL",...
## $ countries     <chr> "South Korea", "China", "Australia", "Japan", "V...
## $ sourceID      <chr> "Panama Papers", "Panama Papers", "Panama Papers...
## $ valid_until   <chr> "The Panama Papers data is current through 2015"...
## $ note          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

This file is composed by 238.402 observations and 7 variables.

Variables are:

  • node_id is a character that has the numeric code that identify unequibocaly the node. node_id is the key of the Entity.
  • name is a character that has the name of the officer.
  • country_codes is a character that contain the code of the country where the office is located. The code is composed by three letters.
  • countries is a character that has the name of the country where the office is located.
  • sourceID in this files the source is always “Panama Papers”.
  • valid_until
  • note

Intermediary Nodes

## Observations: 14,110
## Variables: 8
## $ node_id       <chr> "11000001", "11000002", "11000003", "11000004", ...
## $ name          <chr> "MICHAEL PAPAGEORGE, MR.", "CORFIDUCIA ANSTALT",...
## $ country_codes <chr> "ZAF", "LIE", "MCO", "BEL", "LBN", "MCO", "CHE",...
## $ countries     <chr> "South Africa", "Liechtenstein", "Monaco", "Belg...
## $ status        <fct> ACTIVE, ACTIVE, SUSPENDED, SUSPENDED, ACTIVE, UN...
## $ sourceID      <chr> "Panama Papers", "Panama Papers", "Panama Papers...
## $ valid_until   <chr> "The Panama Papers  data is current through 2015...
## $ note          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

This file is composed by 14.110 observations and 8 variables.

Variables are:

  • node_id is a character that has the numeric code that identify unequibocaly the node. node_id is the key of the Entity.
  • name is a character that has the name of the intermediary.
  • country_codes is a character that contain the code of the country where the intermediary is located. The code is composed by three letters.
  • countries is a character that has the name of the country where the intermediary is located.
  • status is a factor that show the actual status of the intermediary. Posible values are “ACTIVE”, “SUSPENDED”, “CLIENT IN REPRESENTATIVE TERRITORY”, “DELINQUENT”, “INACTIVE” and “UNRECOVERABLE ACCOUNTS”.
  • sourceID in this files the source is always “Panama Papers”.
  • valid_until
  • note

Intermediaries are companies that NEXOS DE UNION ENTRE OTRAS COMPANIAS that means there are in two countries. That is why sometimes intermediaries are between two countries. This is not good for our analysis, so using dplyr, stringr and rebus packages we add half of the number of intermediaries to each country

GeoJSON

This file contain all the necesary information for plot an interactive map plus demografic information, such as, population or GDP of every country. It has a very complicated structure so we are just going to display the structure of the data of each country.

## Observations: 244
## Variables: 66
## $ scalerank  <int> 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 3, 0, ...
## $ featurecla <fct> Admin-0 country, Admin-0 country, Admin-0 country, ...
## $ labelrank  <int> 4, 6, 3, 4, 3, 4, 3, 2, 6, 5, 3, 2, 4, 2, 5, 4, 4, ...
## $ sovereignt <fct> Botswana, Burundi, Angola, Central African Republic...
## $ sov_a3     <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ adm0_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ level      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ type       <fct> Sovereign country, Sovereign country, Sovereign cou...
## $ admin      <fct> Botswana, Burundi, Angola, Central African Republic...
## $ adm0_a3    <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ geou_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ geounit    <fct> Botswana, Burundi, Angola, Central African Republic...
## $ gu_a3      <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ su_dif     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ subunit    <fct> Botswana, Burundi, Angola, Central African Republic...
## $ su_a3      <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ brk_diff   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ name       <fct> Botswana, Burundi, Angola, Central African Rep., CÃ...
## $ name_long  <fct> Botswana, Burundi, Angola, Central African Republic...
## $ brk_a3     <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ brk_name   <fct> Botswana, Burundi, Angola, Central African Rep., CÃ...
## $ brk_group  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ abbrev     <fct> Bwa., Bur., Ang., C.A.R., I.C., Rep. Congo, Cam., D...
## $ postal     <fct> BW, BI, AO, CF, CI, CG, CM, DRC, KM, DJ, DZ, EG, ER...
## $ formal_en  <fct> Republic of Botswana, Republic of Burundi, People's...
## $ formal_fr  <fct> NA, NA, NA, NA, Republic of Cote D'Ivoire, NA, NA, ...
## $ note_adm0  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ note_brk   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ name_sort  <fct> Botswana, Burundi, Angola, Central African Republic...
## $ name_alt   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ mapcolor7  <int> 6, 2, 3, 5, 4, 2, 1, 4, 2, 1, 5, 4, 3, 4, 1, 1, 6, ...
## $ mapcolor8  <int> 5, 2, 2, 6, 6, 1, 4, 4, 1, 2, 1, 6, 1, 4, 2, 1, 2, ...
## $ mapcolor9  <int> 7, 5, 6, 6, 3, 3, 1, 4, 4, 4, 6, 7, 2, 1, 2, 4, 5, ...
## $ mapcolor13 <int> 3, 8, 1, 9, 3, 10, 3, 7, 10, 8, 3, 2, 12, 13, 12, 1...
## $ pop_est    <int> 1990876, 8988091, 12799293, 4511488, 20617068, 4012...
## $ gdp_md_est <dbl> 27060.0, 3102.0, 110300.0, 3198.0, 33850.0, 15350.0...
## $ pop_year   <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -...
## $ lastcensus <int> 2011, 2008, 1970, 2003, 1998, 2007, 2005, 1984, 200...
## $ gdp_year   <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -...
## $ economy    <fct> 6. Developing region, 7. Least developed region, 7....
## $ income_grp <fct> 3. Upper middle income, 5. Low income, 3. Upper mid...
## $ wikipedia  <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -...
## $ fips_10_   <fct> BC, BY, AO, CT, IV, CF, CM, CG, CN, DJ, AG, EG, ER,...
## $ iso_a2     <fct> BW, BI, AO, CF, CI, CG, CM, CD, KM, DJ, DZ, EG, ER,...
## $ iso_a3     <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ iso_n3     <fct> 072, 108, 024, 140, 384, 178, 120, 180, 174, 262, 0...
## $ un_a3      <fct> 072, 108, 024, 140, 384, 178, 120, 180, 174, 262, 0...
## $ wb_a2      <fct> BW, BI, AO, CF, CI, CG, CM, ZR, KM, DJ, DZ, EG, ER,...
## $ wb_a3      <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, ZAR, COM, DJI, D...
## $ woe_id     <int> 23424755, 23424774, 23424745, 23424792, 23424854, 2...
## $ woe_id_eh  <int> 23424755, 23424774, 23424745, 23424792, 23424854, 2...
## $ woe_note   <fct> Exact WOE match as country, Exact WOE match as coun...
## $ adm0_a3_is <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ adm0_a3_us <fct> BWA, BDI, AGO, CAF, CIV, COG, CMR, COD, COM, DJI, D...
## $ adm0_a3_un <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -...
## $ adm0_a3_wb <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -...
## $ continent  <fct> Africa, Africa, Africa, Africa, Africa, Africa, Afr...
## $ region_un  <fct> Africa, Africa, Africa, Africa, Africa, Africa, Afr...
## $ subregion  <fct> Southern Africa, Eastern Africa, Middle Africa, Mid...
## $ region_wb  <fct> Sub-Saharan Africa, Sub-Saharan Africa, Sub-Saharan...
## $ name_len   <int> 8, 7, 6, 20, 13, 5, 8, 15, 7, 8, 7, 5, 7, 8, 5, 10,...
## $ long_len   <int> 8, 7, 6, 24, 13, 17, 8, 32, 7, 8, 7, 5, 7, 8, 5, 10...
## $ abbrev_len <int> 4, 4, 4, 6, 4, 10, 4, 6, 4, 4, 4, 5, 5, 4, 5, 5, 5,...
## $ tiny       <int> -99, -99, -99, -99, -99, -99, -99, -99, 2, -99, -99...
## $ homepart   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ filename   <fct> BWA.geojson, BDI.geojson, AGO.geojson, CAF.geojson,...

Still it is to much for explain every variable, thats why we are just going to define the columns we will use in the future.

  • adm0_a3
  • pop_est
  • gdp_md_est
  • economy
  • iso_a3
  • continent

Analisis

Questions why will try to answer in this project:

Interactive map

Alcance de los papeles

As we see on the map, basically all the world is embolled in the Panama papers. the following table show the countries with the most number of cases.

Composed of:
Type Quantity
E-mail 480.461.817
Data base 304.730.617
PDF 215.426.417
Image 111.702.617
Text Document 32.916.617
Other 2.242

Of course is not right to speak about number of cases if we do not consider how many people us leving in the country.

Composed of:
Type Quantity
E-mail 480.461.817
Data base 304.730.617
PDF 215.426.417
Image 111.702.617
Text Document 32.916.617
Other 2.242

As we see,

Interactive Network

This inform was done for the “Data Management for Big Data” course from “Universit? degli studi di Trieste” year 2017-2018 by Jos? Agustin Spaccesi